Automatic transformation of lecture transcription into document style using statistical framework

نویسندگان

Tatsuya Kawahara

Kazuya Shitaoka

Hiroaki Nanjo

چکیده

This paper addresses automatic transformation from spoken style texts to written style texts. Exact transcriptions and speech recognition results of live lectures include many spoken language expressions, and thus, are not suitable for documents and need to be edited. In this paper, we present a method of applying of the statistical approach used in machine translation to this post-processing task. Specifically, we implement the correction of colloquial expressions, the deletion of fillers, the insertion of periods, and the insertion of particles in an integrated manner. A preliminary evaluation confirms that the statistical transformation framework works well and we achieved high recall and precision rate of period and particle insertion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts

For language modeling of spontaneous speech recognition, we propose a style transformation approach, which transforms written texts to a spoken-style language model. Since these two styles are largely different and thus direct transformation is difficult, we cascade two transformation methods; rule-based transformation to rewrite written-style texts to intermediate “verbatim” texts, and statist...

متن کامل

A monotonic statistical machine translation approach to speaking style transformation

This paper presents a method for automatically transforming faithful transcripts or ASR results into clean transcripts for human consumption using a framework we label speaking style transformation (SST). We perform a detailed analysis of the types of corrections performed by human stenographers when creating clean transcripts, and propose a model that is able to handle the majority of the most...

متن کامل

The ISL Baseline Lecture Transcription System for the TED Corpus

This paper describes the Interactive Systems Laboratories’ automatic lecture transcription system for the Translanguage English Database (TED) corpus, which provides text-hypothesis for the International Workshop on Speech Summarization for Information Extraction and Machine Translation. Furthermore the paper gives a short analysis of speaking style characteristics, in particular addressing nat...

متن کامل

Towards Automatic Transformation between Different Transcription Conventions: Prediction of Intonation Markers from Linguistic and Acoustic Features

Because of the tremendous effort required for recording and transcription, large-scale spoken language corpora have been hardly developed in Japanese, with a notable exception of the Corpus of Spontaneous Japanese (CSJ). Various research groups have individually developed conversation corpora in Japanese, but these corpora are transcribed by different conventions and have few annotations in com...

متن کامل

Provide a model for the establishment of the school in accordance with the indicators and requirements of the Education Transformation Document

Purpose: The aim of this study was to provide a model for school establishment in accordance with the indicators and requirements of the Education Transformation Document. Methodology: The research method was basic-applied in terms of purpose, descriptive-survey in terms of data collection method and combined in terms of data type. The statistical population of the study in the qualitative sect...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Automatic transformation of lecture transcription into document style using statistical framework

نویسندگان

چکیده

منابع مشابه

Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts

A monotonic statistical machine translation approach to speaking style transformation

The ISL Baseline Lecture Transcription System for the TED Corpus

Towards Automatic Transformation between Different Transcription Conventions: Prediction of Intonation Markers from Linguistic and Acoustic Features

Provide a model for the establishment of the school in accordance with the indicators and requirements of the Education Transformation Document

عنوان ژورنال:

اشتراک گذاری